Goto

Collaborating Authors

 value function approximation


Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Neural Information Processing Systems

Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf.



Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Neural Information Processing Systems

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This is a very well-written paper that explores the use of weighted importance sampling to speed up learning in off-policy LSTD-type algorithms. The theoretical results are solid and what one would expect. The computational results are striking. The technique could serve as a useful component in design of RL algorithms. Q2: Please summarize your review in 1-2 sentences The paper is very well-written and presents a useful idea validated by striking computational results.



Fairness in Multi-Agent Sequential Decision-Making

Chongjie Zhang, Julie A. Shah

Neural Information Processing Systems

We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with a consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper presents an interesting idea of using a robust formulation to fit a value function given aggregate states. The robust formulation leads to a much stronger error bound than what is achieved by regular approximate value iteration. The key is in using the algorithm to effectively select weights applied to states within each aggregate. On the negative side, the authors do not do a good job at presenting their notation and algorithm in a clear way, and the computational results are difficult to understand.


Basis refinement strategies for linear value function approximation in MDPs

Gheorghe Comanici, Doina Precup, Prakash Panangaden

Neural Information Processing Systems

We provide a theoretical framework for analyzing basis function construction for linear value function approximation in Markov Decision Processes (MDPs). We show that important existing methods, such as Krylov bases and Bellman-error-based methods are a special case of the general framework we develop. We provide a general algorithmic framework for computing basis function refinements which "respect" the dynamics of the environment, and we derive approximation error bounds that apply for any algorithm respecting this general framework. We also show how, using ideas related to bisimulation metrics, one can translate basis refinement into a process of finding "prototypes" that are diverse enough to represent the given MDP .


14da15db887a4b50efe5c1bc66537089-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their insightful comments. Addressing the common point of limiting our experimentation to a single-decision setting, our intent was to focus our analysis only on the effects of candidate generation. By removing the influences of other factors on the performance of search, for instance, rollout policies and state value function approximations, we can focus the evaluation. We are aware that the sequential-decision setting requires extra reasoning. We would argue, though, that the other components of learning algorithms for search try to ameliorate the amount of reasoning needed --- indeed, learning a perfect value function approximation would essentially reduce a sequential-decision problem to a single-decision problem. However, we do plan on examining our ideas in a full MCTS setting, which we think is a problem deserving its own investigation.